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IBSTRACT ' ^ ' 

. Eapirical eridence is presented related to the 
effects of using a stratified saapling of iteas in aultiple aatriz 
saapling^ on th^e accuracy .of estiiates of. tl^e population aean*. Data ' 
were obtained froa a saaple of 600 high school ''students for a 36-itea 
^aatheaatics test and a UO-itea vocabulary test, both subtests, of the 
loira Tesis of Bdbcational Develogaeat. The results indicate that a 
stratified saapling of iteas, either by itea difficulty level or by 
itea discriainating ability, does not consistently yield* aore 
accurate estiaates of the population aean than does siaple randoa 
saapling*. jCluthor/GIH) . ' ' , . 
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The use of multiple matrix sampl ing (MMS) techniques for program 
evalTlation purpbses has been accepted^by educational evaluators as a 
means of reducing the time and.cdst required to perform pfogram eval- 
uSttons (Knapp, 1972).' The usual MMS procedure involves the admim- ^ 
stration of a samp^le of items drawn from a larger item uni,verse t6 a 
sample o^ respondents drawn from a population of respondents. On the 
basis of this information, estimates of popHlation"^parame1^ers [usually^ 
the mean (u) and variance (a^)] are obtained and used as part of^the. 
evaluation data. It is desirable, of course,, to estimate*these para-^ 
meters as accurately as possible. ( ' ' \ 

/Many investigators Wve ^attempted to identify procedures for se- 
' lectins a Mf-IS procedure "that will provide the most accurate estimates 
of the population parameters [See, 'for example, Shoemaker (1970a, 
1970b, 1971, 1972, 1973), Knapp (1972),. and Barcikowski (1972, 1974)]; 
The majority of these investigations have been concern^ed^with "the* 
selection of a set of design parameters [i.^^the number of subtests 
(t), the. number of items per suj3test (k)/and the number of examinees^ 
per subtest (n)] that will yield accurate estimates of the meaiwnd ; 
variance. Relatively fen^tfnvestigations have examined the effects of 
using stratifi-ed. random sampling of itens rather than simple rando^l 
sampling of items on the accuracy of these estirriates.. Shoemaker (1973][, 
has suggested that when using MMS,' items should be stratified by 
difficulty level rather than content area . ^' My erb erg (1-975) stratified ' 
items by difficulty level and then used MMS -procedures to draw samples 
from a computer -generated data base. We found that s-tratification by 
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item- difficulty -did. not consistently result in more stable estimates of 

./' . ■ 

the" standard- errors of u and than did simple random assignment. It 

was found that systematic decreases in the standard error terms occurred 
only when concurrent stratification by item difficulty .and content was 
used* ^ • . ; . ■ . ' 

With regard to item discrimin'ating ability, ^arclkowski (1972, 1944) 

3 * 

concluded that discriminating ability as measured by the bi serial corre- 
lation between each item and total test scorej does affect the variability 
of the, estimated mean. Samples, drawn from a- universe with bis'erial corre-^ 
lations in the range .05-. 50 resulted in more precise estimates of t-he 
mean than did samples drawn from a universe of items with biserial corre- " 
lations in the .40-. 70 range. For the variance, it was found that A^hen 
the biserial correlations were relatively homogeneous, the MMS procedures 
provided more precise estimates of the variance than traditional examinee- 
sampling procedures. When the biserial correlations we^e relatively 
heterogeneous, traditional examinee-sampling provided estimates which were 
as. precise as those obtained by the MMS' procedures . ■ 

The primary purpose of this s^udy was to provide additional empiri- 
cal evidence related to the effects pf using ^trat^ifi.ed sampling of items 

in WIS on the accuracy of estimates of the population mean. Two methods 

' ■ • . - ( 

of stratification were examined: (1 ) .stratification by item difficulty; 
ami (2) stratification by item 'discriminating ability. 



PROCEDURES ' ' ' 



• The method of analysis used in this study was the post mortem 

^ * I? c ... 

approach in which samples ^are drawn from-^a data base with^known para- 
meters- Sample estimates ofJ:he palranjeters of interest are then 



computed and compare?! to the known data base values. 
Description of- the Data Base 

— ^ : ■ \ ■ 

The jtejns used in this study were from two subtests of the Iowa 
Tests of . Educationa|^ Development (ITED): mathertijitics (36 items) and 
vocabulary (40 items). These two subtests were chosen primarily be- 
cause the distributions of scores on these subtests were known to be 
relatively different. Also, the items on each Subtest are indepen- 
dent, {i.e., the response to a given -ite^i i3 not dependent on the 
responses to other items). ^ 

During 1971, 16,819 ninth grade students in. Iowa took the math 

•* • 

test . (Form X-6). A 1 in T systematic- sample was drawn to give a data 
base with N = 600-. For this data base of 600 the^/ollowing values . 
were found:" w = }}.6p,a^ = 30 ."385,. a = 5,512, skewness = .965, and ^ 
kurtosis = 3.680. The range was 32 with a minimum score of one and^ 



maximum score of 33. The reliability (KR 20) for the data base was 
.779^. The distribution was positively skewed and leptokurtic. 

The second data base was derived from the scores of 13,821 ^ 
eleventh grade students in Iowa //ho took the vocabulary test (Form 
^6) in 1973. A 1 in f systematic sample was drawn to construct a 
data base with N = 600. For this data base the followi^ng values were 
found: u = 22.682, = 88.328, a = 9.398, skewnes-s = -.076," and 
kurtosis = 1.971. The range was 37'with a minimum score of three and 
a maximum score of 40. The reliability (KR 20) for the data base was 
.923. -The distribution was slightly negatively skewed -and platykurtic 

Sampling Procedures 

* » - • * - 

Two methods of sampling wera employed to assign items to subtests 



(1) simple random sampling and (2)- stratified random sampling. The 

data for stratifying the*items according to' difficulty level came from 

norms for the state of Iowa for the year proceeding the sample^data ' . 

collection. For the^math test, the nAhnative data came from the 1970., 

administration] and for the vocabulary test the data came from th6 1972 

administration. The difficulty indices for the math test ranged ^from 

^4 to •67/ For- the vocabulaiey test the range was .22 to •78, Wl;ieji 

stratifying by item discriminating ability, the item discrimination 

indices were taken from item tryout information. The range for these 

indices was .25 to ,73 far the math test and .36 to .7*9 for the v6- 

cabulary test. [The |tem discrimination indices Sre Flanagan indices. 

The high. and low groups were defined on the basis^f the tot&l test 

score on the associated subtests of the ITED. See Flanagan (1939) for 

further explanation.] 

Table 1 lists ^e sampling plans which were implemented for the 

mathematics test and Table 2 lists sampling plans- for the 'vocabulary . 

test»^n these tables, t is the number of subtests,_k specifies the '. 

number of items per subtest, and n' is the number of respondents per 

subtest. IPSS indicates the nuiiiber of items included in each subtest 

from each Strata.* NS specifies the. number of strata and IPS Indicates 

the number of items per strata. * Those sampling pla^is with N5 = 1 

obviously did not involve stratification. . The sampling of items 

wi-thin strata was done randomly. For example, when sampl irig yfrom the 

math test, if IPSS = 6, NS = 2, and IPS = 18^ this indicates that the 

36 item math test was divided into two strata of 18 items each. For 

purposes of illustration assume that t '= 3, k ='12, an^ n = 20. In 
,; ■ • ■ 'X , 

this instance there are three s.ubtests of 12 items each with si^x items 



randomly selected' from each strata composing a given subtest. 

Certain relationships of interest exist among the parameters of the 
sampling plan: , '. • ."' -i '' • 

' * " (t)(k) = (NS)(1PS) = K , 

. % • ■ . . 

where K ii the total number of items in the universe (a) 

and ■ . . , (IPSS)(NS).= k ' ' (b) ' 

Equation (a) shows that the ^l>er of subtests multiplied by the number 
of items per subtest is equal to the number of strata multiplied by tije 
number of items per strata which is'' equal to the number of items in the 
universe. Equation (b) shows that, the number of items per subtest per 
strata multiplied by the number of strata is equal to the numbe? of 
items per subtest. ... ^ • 

■ Each' sampling plan was implemented twice; first stratifying by 
difficulty level and then restratifying the items according" to dis- • 
criminating ability. No attempt was made to' stratify the items con- 
currently by difficulty level and discriminating ability sim:e'in an 
applied evaluation setting concurrent stratification would most likely 
be carried out orKthe basis of either difficulty level or- discrimi- 



nating ability and con^tent. Since t^ items comp.ris>rtg bo^h data, 
bases are relatively homogeneous with regard- to conten-t, stratification 
by content did not seem reasonable.-. ^ 

For all sampling plans, the sampling of the item universe was 
exhaustive and without replacement of items for the -constructidn of 
each subtest. That is, an item .assigned to a particular subtest was 
not returned to sthe item pool before constructing the, next subtest. 
This procedure assured that ev^ry i.tem -appeaijed on one subtest and all 



Items appeared an equal number of times among tffe subtests which is in 
accordance with the guidelines proposed by Shoemaker (1973). The 
-sampling of students from the population was done randomly so that 
each student's, response was included for Qnly one subtest. This was 
done because it is unlikely that in an applied evaluaffon setting a ^ 
given student would respond to more than one subtest while some ,stu- 
dents would not respond to any subtest. Therefore, the sampling of 
both items and students was exhausti/e ^ince all items in the uni- 
verse and all students in the population were utilized. 

Indices ^f Accuracy ^ 

In mist evaluation studies <he major parameter of interest js 

the POfwlatipn mean, u. This study wa.s concerned'with the accuracyiof 
'different,methods of estimating u. The actual • estimate of the popula- 

tion mean was accomplished as fol-lows. First, for each subtest, u 
•*was estimated using the following formula [.Shoemaker (1973), p. 27]: 

n ■ k ' ' • 

u = (K E 1 X..)/nk '(1) 
where = the estimated mean universe score for subtest s 



r 



K = the number of items in the universe 
n-= the number of examinees who respond to each subtest 
k*= the number of items per subtest ' 
= the observed score for individual i on item j. > ' 

Then, the estimates from each subtest were pooled using the for- 
mula, -below [Shoemaker (1973),' p. 38] to provide a single estimate of 
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the universe mean: 

■ m ■ 



3. = s=l 



Z .0^ Us 



(2) 



Where yp = the pooled estimate of the population mean 

t = the number of subtests 

0, A n k ; the number of observations per subtest, 
s s s ^ 



Th^ accuracy of these y l estimates was examined using some- 
what related indices.. • ■\ 

The first of these "was labeled SE(yp) and wap computed ajs 
follows: 



SE(ilp) = 



i=l P P- 
(NREPS - 1) ^ 



('3). 



where is the mean over* replications of the Wp values. 
The second index of accuracy was defined as follows! 

"NREPS 

f ' -J, = p=,l ^^ - 

. .1 — 



X 



^ NREPS 



: (4) 



SE(ijp) indicates how closely the estimates of v cluster around the 
average pooled estimate (Op) and i\\^icates how closely the estimates 
of y cfuster at^und the true data base mean. Although .these two -indijces 



are somewhat^di1\f^ent, they are highly related since the y values '. 



are unbiased estimators of y. 

Both the^SE('ii■) and the '? values were used to compare the . 
stratified and nonstratified MMS plans. For alT comparisons, the 
values of t, k, and n were heldCQnstant. For exaiflple, in Table 3,^ 

• plan 1 (nonstratified) was 'compared with plan 2 (stratified), but 
_yP*lan 2 was not compared witft-plan 3 (nonstratified) because the '- 

latter two plaii? invo,lve different values of t, k, and n. 

■ RESULTS AND CONCLUSIONS - ' . - 

' > . <i 

The results of the. various sampling procedures are listed in 

Tables 3, 4, 6, and 7.---fr> these tables the design parameters of the 

sampling plans are specified by the number of subtests (t), the num- 

'ber of items per subtest (k), the number of examinees per subtest (n), 

• . the total number of observations (0),,the number of- items per subtest 

from each strata (IPSS), the number of -strata ,(NS), and the number of_ 
items per Strata (IPS). The column labeled' jlp is the estimated value of 
y pooled over replications, SE(iiK)- indicates ttie standard error of Op 
•[Equation (3)] and ¥ is defined by Equation (4). 

^ ' ' * r * ' 

The column headed 5TRAT indicates th^ method of stratification^ 



with NO indicating that the items were not strati f fed, DIFF indl-catingv 

■ 'J ' ' ^ " 

stratification by level of item difficulty and DISC. showing that items 



wer'alsxrati f f ed by item discriminating- ability. The Ust column shows 
the nunbir of replications of the sampling plan (NREPS). 

The results for stratification by difficulty ]evel of the math 
test are S^sented in Table 3 and the results for the vocabulary t;est 
are contained in Table 4. '^As noted previously, stratified plans were 
jcompared with nonstr^ified plarf^ holding t, k, and n constant. For 
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example, considering the math test stratified by. difficulty level , plan 

A 

1 (Table 3) was compared with plan 2, plan 3^with plans 4 and 5, plan S 
' with plans 7 through 9, plan 10 with)|!)lans 11 and 12, and plan 13 «»ith 
jJUns 14 through. 18, '.The results of these comparisons are. summarrzed in 



Table 5 where the numbers in the table indicate, which 



yp^of, sampling . 



plan yielded the smaller value for each of'the-two s-t^iistifcs used as a' 
basis for comparkon. For example, with SE({lp) as the dTSfterion, eight, 
of the 13 comparisons show that the stratified plans produced smaller 



values of tE(Up) than the comparable nonstratified plans. ^ 

The data in Tables 3, 4, and 5 do not provide. conclusive ev.idence 

favoring stratification, by^ifficul ty level when assigning items to sub- 
N -) • ■ 

o. tests. These results generally suppQrt Myerberg's^ (1.975) contention that 

stratified random sampling of items by item difficulty .does not neces- 
saril^-result in more accurate^estimates of the mean than simple rando 
sampling of items, - • .. • 

. The results of stratification by item dtscrimiaating_ ability for 
the math test are listed In Table 6 and the result^ for .the vocabulary 
test are presented > Table 7, Table 8 sunmarizes the comparisons 
between the stratified and nonstratified ^designs , Again, ^^either type 
of samplingV^an consistently resulted .in more accurate estimates of y. 
However', there' was a slight tendency for . the s.tratified sampling plans 
for the vocabulary test to produce more accurate estimates than the. 
simple /andom'sampling plans, ^ * 



/ 11 



^ Concludinjg Statement ' • • 

General izatibns from the results of this study must be made" very 
cautiously. Only two item universes were studied. Furthermore-, the*\ 
number of replications used to estimate the accuracy of the two MMS / 
procedures was 'fextrqmely SjnalT for studies of. this type. Nonetheless,, 
these results do provide additional dat-a reTated to the-.e1;fects of 
using item stratification procedures in MMS. In general,- these results 
ilhdicate. that str-atified sampl.ing.of items either by item difficulty . 
level or by item discriminating abil^ity does not consis-tently yield. ^ 

more' accurate estimates of u than does simple random "sampling. 

■ , " ■ . 'V 
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TABLE 1 

Stratified Sampling Plans implemented 
for the Mathematics Test 



Plan 


t 






IPSS 


NS* 


IPS 


NREPS** 


1 


12 


3 


213 


3 


1 


36 


10 


7 

Mm 


12 


3 


20 


1 


'3 


12 


5 


3 


9 


4 


20 


4 


1 


36 


10 






4 


20 


1 


4 


. . 9 




5 


9 




20 


2 


2 


18 


5 


'6 


6 


6 


, 20 


6 


"1 


36 


10, 


7 


6 


6 


20 


1 


6 


6 


5 


8 


6 


6 


20 


2 


3 


12 


5 


9 




6 


20 


3 


2 


18 


5 ' 


10 


' 9 


20 


9 


1 


36 


10 


" 11 


, ■ 4 


9 


■ 20 


1 


. 9 


4 


5 


12 


4 


9 


20 


3 


3 


12 


5 


• 13 


3 


12 


20 


12 


I 


: 36 


10 


14 


3 , 


,12 


20. 


1 


: 12 


3 . 


5 


15 


3 


12 


20 


2 


6 


6 


5 


16 


3 


12 


20 


3 


4 


9- 


S 


17 


3 


11 


20 


4 


3 


12 


5 


18 

1 


. 3 


12 


20 


6 


2 


■18 


.• ■ 5. 


* 


Sampling plans 


with 


NS=1 did 


not involve stratifica- 




tion." 















** The difference in NREPS between those , plans with 

NS«1 and NS>1. is due "to the heed to reduce, computer 
costs. 
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TABLE 2 

Stratified Sapling Plans Implemented 
for the Vocabulary Test 



Plan 



IPSS 



NS* IPS 



NREPS 



1 

2 
3 
4 
5 
6 
7 
8 
9 
10 
11 
12 
13 



4 

4 
4 

-4 
5 
5 
5 
5 
8 
8 
10 
10 
10 



10 
10 
10 
10 
8 
8 
8 
8 
5 
5 
4 
4 
■ 4 



20 
20 
20 
20 
20 
20 
20 
20 
20 
20 
20 
20 
20 



10 • 
1 
2 
5 
8 
1 
2 
4 
5 
1 
4 
1 
2 



1 
10 
5 
2 

1 

■ 8' 
4 

f 

5 

1 

4 

2 



40 

4 

8. 
20 
40 

5 
10 
20 
40 

8 
40 
10 
20 



5 
5 
5 
5 
5 
5 
5 
5 
5 
5 
5 
5 
5 



* Sampling plans with NS=1 Aid not involve stratification. , 
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PLAN 



TAkE'3 

Rtsult? of*Strat1f1cation by Oifflcufty Level 
in Assigning Items to Subtests - Mathematics (u«n.623) 



I 



se{Sn) 



to 





^t 

12 


k 
3 


n 

20 


0 

720^ 






7^ 








IPSS 




NS 


IPS 










1 


3 






36 


11.620 ' 




\ .399 


NO 




2 


. * 1 




3 


12 


12.000 ■ 


)965 


.846 


OIpp 
— 1 


e 
9 




t 
9 


k 
4 


n 

20 


0 

720 






♦ 








IPS3 




NS 


IPS ' 








- 




3 , 


4 , 




1 


36 . 


^ 11.320 


.618 


.489 


NO 


10 


4 


1 




4 


9 


11.950 


.250 


.356 


OIFF 


5 


5 


2 




2 


18 


11.920 


.568 


.486 


^ OIFF 


5 


- 


t 

t. % — 


k 
6 


n 

20 


0 

.720 














^ IPSS 




NS 


IPS 






• 






6 


6 




- 1 


36 


11.495 


.571 


,465 


NO 


10 


7 


1 




^ ' 6 


6 


11.490 


.426 


.305 


f OIFF 


5 


8 






3^ 


12 


11.810 


.792 


'.705 


.OIFF 


5 








IS 


11 .750 


.539 


.385 


. , OIFF ^ . 


5 






k 

9. 


n 

20 


• 0 

^ 720 














IPSS 




NS 


IPS . 










• 


•10 


' ' 9 ^ 




1 


36 


11.S9S 


.525 


.450 


NO 


10 


n 


. 1 


• 


9 




11.580 


.920 


.775 


OIFF 


5 


12 


3 ' 




3 


12 


12.200 


.519 


.577 


OIFF 


5 




t 

3 


k • ' 

12, 


n 

20 


0 

720' 














IPSS 




NS 


720 


• 










13 


12 




1 


36 


11 .745 . 


.728 


.576 


NO 


10 


14 " 


1 




12 




11.220 


.676 


.594 


DIFF 


5 


15 


2 




6 




11. '680 


.843 


.635 ' 


OIFF 


5 


16 
17 


^ 3 
4 




4 • 
3 


9 ■ 
12 


12.060 
'11. '960 


.827 
.482 


.726 
.435 


OIFF 
OIFF 


5 
5 




6 




2 


18 


11.310 ^ 


.580 


\ .515 


OIFF^ 


5 
























♦ 








IG 
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TABLE 4 [ ^ ' 

ftesults of -Stratification by Olfflculty Level 
in Assigning Items to Subtests - Vocabulary"(y«22.6a2) 



:\ PLAN 
















STRAT 


UI 

i 


« 

i 


^ t ' 
4 


, It 

10 


n 

20 


0 

800 


• 

\ 


• 










,IPSS 


-! 

1 


NS 


IPS 










" ■ . 1 


10 ^ 




t 


'40 


22.080 • 


1.403 


. 1.236 


NO 


5 


2 \ 
.t -1 3 


1 

2 




10 
S 


4 
8 


21.780 
21.980 


1.106 
1.667 


1.229 
1.496 


- OIFF 
OIFF'^ 


5 
5 




5 




2 ^ 


20 


21.930 
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